On Supernode Transformation with Minimized Total Running Time
نویسندگان
چکیده
With the objective of minimizing the total execution time of a parallel program on a distributed memory parallel computer, this paper discusses how to nd an optimal supernode size and optimal supernode relative side lengths of a supernode transformation (also known as tiling). We identify three parameters of supernode transformation: supernode size, relative side lengths, and cutting hyperplane directions. For algorithms with perfectly nested loops and uniform dependencies, for suuciently large supernodes and number of processors, and for the case where multiple supernodes are mapped to a single processor, we give an order n polynomial whose real positive roots include the optimal supernode size. For two special cases: (1) two dimensional algorithm problems and (2) n-dimensional algorithm problems where the communication cost is dominated by the startup penalty and therefore, can be approximated by a constant, we give a closed form expression for the optimal supernode size, which is independent of the supernode relative side lengths and cutting hyperplanes. For the case where the algorithm iteration index space and the supernodes are hyperrectangular, we give closed form expressions for the optimal supernode relative side lengths. Our experiment shows a good match of the closed form expressions with experimental data.
منابع مشابه
On Time Optimal Supernode Shape
With the objective of minimizing the total execution time of a parallel program on a distributed memory parallel computer, this paper discusses the selection of an optimal supernode shape of a supernode transformation (also known as tiling). We assume that the communication cost is dominated by the startup penalty and therefore, can be approximated by a constant. We identify three parameters of...
متن کاملOn Optimal Size and Shape of Supernode Transformations
| Supernode transformation has been proposed to reduce the communication startup cost by grouping a number of iterations in a perfectly nested loop with uniform dependencies as a supern-ode which is assigned to a processor as a single unit. A supernode transformation is speciied by n families of hyperplanes which slice the iteration space into parallelepiped supernodes, the grain size of a supe...
متن کاملAn Adaptive Space-Sharing Scheduling Algorithm for PC-Based Clusters
In recent years, PC-based cluster has become a mainstream branch in high performance computing (HPC) systems. Like other systems supporting HPC, one of the most important concerns in PC-based cluster is how to improve response time, throughput, and utilization. Therefore, scheduling can have a significant impact on performance characteristics of the system. This paper focuses on building an ada...
متن کاملOn Supernode Transformations And Multithreading For The Longest Common Subsequence Problem
The longest common subsequence (LCS) problem is an important algorithm in computer science with many applications such as DNA matching (bio-engineering) and file comparison (UNIX diff). While there has been a lot of research for finding an efficient solution to this problem, the research emphasis has shifted with the advent of multicore architectures towards multithreaded implementations. This ...
متن کاملData Parallel Code Generation for Arbitrarily Tiled Loop Nests
Tiling or supernode transformation is extensively discussed as a loop transformation to efficiently execute nested loops onto distributed memory machines. In addition, a lot of work has been done concerning the selection of a communication-minimal and a scheduling-optimal tiling transformation. However, no complete approach has been presented in terms of implementation for non-rectangularly til...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- IEEE Trans. Parallel Distrib. Syst.
دوره 9 شماره
صفحات -
تاریخ انتشار 1996